SC/60/BRG24
APPENDIX II
PHOTO-IDENTIFICATION SOFTWARE
FOR BOWHEAD WHALE IMAGES
GILBERT R. HILLMAN
*
, KELLY D. TRASK
+
, KATHRYN L. SWEENEY
#
, ANDREW R. DAVIS
+
,
WILLIAM R. KOSKI
+
, JULIE MOCKLIN
#
AND DAVID J. RUGH
#
*
5030 Marathon Drive, Madison, WI, 53705, USA.
+
LGL Limited, environmental research associates, 22 Fisher St., P.O. Box 280, King City, Ontario, L7B 1A6,
Canada.
#
National Marine Mammal Laboratory, Alaska Fisheries Science Center, National Marine Fisheries Service,
National Oceanic and Atmospheric Administration, 7600 Sand Point Way, NE, Seattle, WA 98115-0070, USA.
ABSTRACT
A computer program has been developed for identification of individual bowhead whales
in aerial photographs. The program enables the user to create a collection of images and
to compare an image to the collection. The collection is ranked by the program based on
similarity to the query image, facilitating identification of the individual in the image.
The program has been tested by the developer (GRH) with only a small number of
images, but it appears to function accurately. Eleven successive versions of the program
were provided to LGL Limited, who tested the program with a larger collection of images
and provided reports on bugs in the system and feature requests to the developer. These
bugs were fixed and features were added to make the program perform better, but further
improvements to the program are likely to improve matching performance. Future
potential improvements will be identified when larger numbers of images are processed
through the program.
KEY WORDS: BOWHEAD WHALE, PHOTO-ID, MARK-RECAPTURE, ARCTIC,
COMPUTER-MATCHING
INTRODUCTION
The photographic database of the Bering-Chukchi-Beaufort Seas (BCBS) population of bowhead
whales (Balaena mysticetus) contains almost 18,000 images collected from the late 1970s to
2007. About half of the images that have been classified for image quality are of adequate
quality to determine if the whales are marked and could be recognized if they were photographed
in the future. All of the images in the database, but particularly the better quality images, have
provided useful information on life-history parameters including calving intervals (Miller et al.,
1992; Rugh et al., 1992a), growth rates (Koski et al., 1992, 1993), population structure (Davis et
al., 1983; Koski et al., 1993, 2006, 2007; Angliss et al., 1995), population size (Rugh, 1990; da
Silva et al., 2000; Schweder, 2003; Schweder and Sadykova, 2007; Koski et al., 2008) and
survival rates (Whitcher et al., 1996; Zeh et al., 2002). On-going studies continue to use these
data to further describe various parameters of bowhead whale life history and population
dynamics (Rugh et al., 2007).
The size of the bowhead photographic database has increased to the point where collection
of new photographs results in a relatively high probability of re-sighting a previously
1
SC/60/BRG24
photographed marked whale. This is because the proportion of the population that has been
photographed has increased, and consequently, the likelihood of photographing a whale that has
been photographed during previous studies has increased dramatically. Thus new photographs at
this time provide much more information on individual whales and the population than the same
number of photographs did during the early years of bowhead whale photography studies
(Schweder and Sadykova, MS). However, as the number of marked whales in the collection has
increased, the time to match each new photograph to earlier marked whales has also increased.
To assist with the matching of the rapidly-growing number of photographs in the BCBS
photographic collection, we have developed a computer-assisted matching program. This paper
describes the development and current capabilities of the program and discusses future testing
and development that will improve the efficiency of the program.
METHODS
Programming
All programming was done using the Matlab (The Mathworks, Inc., Natick, MA) development
system. The program was written using Matlab’s scripting language and graphical user interface
(GUI) developer. The resulting files can be provided to any potential user who has a current
Matlab installation. They can also be compiled by the programmer to produce MS Windows
executable files, which run on any Windows XP or Vista machine. Executables for Mac and
linux systems could also be created if needed. A Matlab runtime file, which the programmer is
permitted by Matlab to distribute to end-users, must also be installed along with the executable
file. For this project, the Matlab runtime and the executable files were sent to LGL Limited
(LGL) and the National Marine Mammal Laboratory (NMML), and personnel there successfully
installed and ran the program.
Entering images into the program
Bowhead whale images are opened in the matching program and can be TIF or JPG files; other
common file formats could be accommodated. When an image is opened, the program
automatically creates specific subfolders into which the program will insert metadata
(information about the images). The metadata from all subsequent images in the same folder will
be added to these subfolders.
Once an image is open, the user selects “Geometric Transform” from a menu. The image appears
in a new window next to a provided “standard” image, which is simply a whale image of
excellent quality. The user clicks on two points in the new image, such as the tip of the rostrum
and the end of the fluke, and on the corresponding two points in the standard image. The
program scales, rotates, and translates the new image so that it will be superimposable on the
standard image, and therefore on all other images normalized to the same standard image. The
normalization parameters are saved and are applied to that image each time it is read by the
program.
Some images were not sufficiently normalized by this procedure because the whale was flexed or
twisted in the image. Therefore, an “Un-twist” option was added to the image capture procedure.
The user selects “Un-twist” from the menu and then clicks on a point on the rostrum, a series of
at least four points along the anatomic midline and one at the base of the fluke. The program
2
SC/60/BRG24
performs a nonlinear distortion of the image that causes the designated points to fall on the line
between the rostrum and the fluke, with interpolation of all pixels between the selected points.
The user then selects the markings on the whale that are to be used for identification. First, a
mouse click selects each mark. If the mark is a small dot, no further action is needed. If it is a
large mark or has a distinctive shape, the user may outline the mark, and data about its size and
shape are stored for use in matching. A capability has been added to review an image and its
stored marks, and to add new selections, move, edit or delete previously selected marks without
re-doing the entire mark selection process for that image. This feature allows updating of mark
information for a whale from second and subsequent photos of the same whale. This ensures that
the mapped image is an accurate representation of a whale and permits us to incorporate
information from multiple images of a whale into a single image for matching.
Because some regions of a whale are not clearly visible on some photographs, the user may
designate all or part of each image as a region of interest (ROI) for matching. If such a region is
designated, the program will disregard marks in another image that are outside of the ROI during
comparison of the two images. A ROI includes portions of the image which are not obscured by
foam, glare, etc. The default includes the entire image if ROI is not selected.
The scaling and orientation, un-twisting, ROI, and locations and descriptions of the
distinguishing marks are automatically stored in the subfolders, and are recalled by the program
as needed.
When this pre-processing of an image is complete, the user selects “Add to Excel File.” This
adds the file name of the new image to a list maintained in a MS Excel file; this file stores the
names of all files in the catalogue and the identity of the whale in the image. When the image is
first added to the file, the program adds the identifier “0000” as its ID, later to be changed by the
user when the image is identified either as a new individual or a match to a previous image. The
files listed in the Excel document are the ones to which each new image is matched.
Matching Images
The user selects any image that has previously been entered into the matching program. If the
normalization and spot marking procedures have not been completed for the image, a notice
reminds the user to complete them.
The size, orientation and shape of scars are sometimes distinctive and incorporation of these
attributes into the matching process should increase the reliability of matching two photographs
of the same whale. The user is therefore given an option to more heavily weight mark attributes
that have been previously entered for the image, when the higher weighting is likely to improve
the ability of the program to identify a match. For example, this option would be used if a whale
had a distinctively-shaped mark so that all whales with a similar-shaped mark at the same
location would be ranked higher on the list of potential matches than they might be without
selection of this option. The parameters that are measured for each mark are: location, area
(number of pixels), eccentricity (a number between 0, indicating round and 1, indicating a
straight line), orientation (angle in degrees from horizontal of the major axis), length (the major
axis of an ellipse that approximates the mark) and perimeter. When marks are matched, the
difference between them is the Euclidean distance in the seven-dimensional space of these
parameters (x position, y position, area, eccentricity, eccentricity×orientation [i.e., if it is round
its orientation is not relevant], length and perimeter). Each of the parameters is weighted by a
3
SC/60/BRG24
default value. The default is 1 for all parameters except eccentricity which is 10 (because its
native value is limited to 0 to 1) and 0.01 for area. The user can set the value to zero to have a
parameter not count at all or any higher number to have the value weighted more heavily. The
program saves the new weights in a file called “wtfile.mat.”
Any two images may be compared by reading one as “Image 1” and the other as “Image 2.” A
“Compare two images” button measures and displays the difference between the two images.
During testing of the program, this single matching process was useful for examining how the
match was affected by the way spots were marked.
In the usual operation of the program, the user does not open an “Image 2”, but instead, clicks a
“Compare to List” button. In this case, all of the images listed in the Excel file are matched to
Image 1. The program measures the dissimilarity between the query image and each listed
image, and it sorts the entire list by the dissimilarity parameter, with the images that are most
similar to the query image at the top of the list. It then displays a window with the sorted list and
the dissimilarity value. The program also displays, as an array of small images, the six images
that are most similar to Image 1, showing the file names and ID’s (Fig. 1). Each small image has
a button that allows its image to be displayed in a separate large window, with its selected marks
shown. If the user sees a match among those six images, he enters the ID of the query image in a
provided text box. If a match is not seen, the user may view the next six. The user can scroll
through the catalogue of images in either direction, displaying them six at a time, until a match is
found, or the user is convinced that there is no match. Either a pre-existing or a new ID may be
entered, and it is recorded in the Excel file instead of the default “0000”.
The match or absence of a match is confirmed by a trained observer with assistance of the
program. The goal of the program is to reduce the number of images that must be reviewed. If
the program worked perfectly, its first suggestion would always be a match, if one existed. This
ideal goal cannot always be achieved because some matching errors are inevitable, some images
are unclear and some whales either lack distinctive markings or have markings resembling those
on other individuals. Nevertheless, if a correct match occurs at or near the top of the list, much
labour will be saved.
Comparison to earlier programs
The overall design of the program is based on previous photo-ID programs that were developed
for other species such as dolphins, sea otters, and gray whales (Hillman et al., 2003; Markowitz
et al., 2003). The bowhead program code is entirely new, with many features that are specific to
the bowhead whale application.
The processes of file access, Excel file access, image display and mouse-use take advantage of
Matlab functions. The scale-rotation-translation normalization process uses Matlab’s linear
conformal transformation. However, the un-twisting function is a new algorithm. When the
control points have been selected, the image is rotated such that the first and last points fall on
the same horizontal row. A spline is then drawn through the intermediate points, with a spline
point computed at each pixel column. Each column is then revolved along its vertical length so
that the spline points all fall on the horizontal between the snout and the tail. The entire image is
then rotated back to its original orientation. This process is fairly slow (at least several seconds),
so it should not be used unless there is enough curvature or twisting to disrupt the location of
identifying marks.
4
SC/60/BRG24
Matching function
The matching function works as follows, with the process repeated as the query image is
matched in turn to each image in the collection:
The program recalls the locations and shape properties of the two images (A and B) being
matched.
If A or B has a designated valid region, the marks in the other image are tested to
determine whether they fall within this region. Any marks that are outside the other image’s
valid region are temporarily discarded.
The Euclidian distance is determined between each point in A and B in a 7-dimensional
space comprised of the X distance, Y distance, mark area, mark eccentricity (a scale from 0 to 1
indicating shape from a circle to a line), angular orientation of the mark, length of the major axis
of the mark, and perimeter of the mark.. The closest points are determined. If the images have
unequal numbers of marks, the smaller number is used (e.g. if A has 3 and B has 5 marks, the
closest 3 pairs of points are found)
Using a linear translation function and Matlab’s optimization algorithm, the relative
position of the two images is adjusted to minimize the distances between the corresponding
closest-fit points.
The difference values between the query and all other images are accumulated and sorted,
establishing the order in which images are presented for inspection.
A second matching algorithm was also implemented after discussion with an engineering
colleague, Dr. Hemant Tagare at Yale. Rather than the successive-approximations optimization
of the point locations, it uses an analytical minimization of the summed distance, with a specified
number of iterations. The advantage of this method is that it can weight the distances in such a
way that the matching value is weighted more heavily by points that fit more closely than those
that are distant matches. The result of this weighting is to reduce the influence of outlying marks
that might be due to errors or artifacts. The user of the program can select either of the two
matching algorithms; experience will tell whether one is actually more effective than the other.
DISCUSSION
A computer-assisted matching program has been developed and has been tested with a relatively
small number of photographs. The program runs without errors when used correctly and
identifies images quite accurately within the test dataset, but it has rough edges; for example,
there is no internal help file and user error trapping is not complete. The latest version of the
program was provided to LGL and NMML on 23 April 2008.
More study is needed of the weights that control the relative influence of distance, area, etc. of
marks on the similarity of a matched pair of images; it is still uncertain what values to assign to
these weights in order to provide the most accurate matching. Most importantly, there must be
testing of the program with a large collection of images that have already been matched to
determine how far down the list of potential matches actual matches may occur.
Additional program bugs will inevitably be found as more extensive testing is carried out, and
these must be corrected. Some improvements are needed, such as a more flexible system of
organizing the data files to facilitate the matching process. Additional features that may improve
the reliability of the matching process have been requested by the testers at LGL and NMML.
5
SC/60/BRG24
Throughout the development process there has been frequent and effective communication
between the programmer and the testing groups, and the result is a working draft of a program
that may be extremely helpful for matching bowhead whale images.
ACKNOWLEDGEMENTS
Development and testing of the computer-assisted matching program was by funded Minerals
Management Service (MMS) through an inter-agency agreement between MMS and NMML.
We thank Chuck Monnett of MMS for administering the contract and comments on earlier drafts
of this paper.
REFERENCES
Angliss, R.P., Rugh, D.J., Withrow, D.E. and Hobbs, R.C. 1995. Evaluations of aerial
photogrammetric length measurements of the Bering-Chukchi-Beaufort seas stock of bowhead
whales (Balaena mysticetus). Rep. int. Whal. Commn 45:313-24.
da Silva, C.Q., Zeh, J., Madigan, D., Laake, J., Rugh, D., Baraff, L., Koski, W. and Miller, G.
2000. Capture-recapture estimation of bowhead whale population size using photo-
identification data. J. Cetacean Res. Manage. 2(1):45-61.
Davis, R.A., Koski, W.R. and Miller, G.W. 1983. Preliminary assessment of the length-
frequency distribution and gross annual recruitment rate of the western arctic bowhead whale as
determined with low-level aerial photogrammetry, with comments on life history. Rep. to Natl
Mar. Mammal Lab., Natl Mar. Fish. Serv., Seattle, WA by LGL Ltd. Paper SC/35/PS5
presented to the IWC Scientific Committee, June 1983 (unpublished). 91pp.
Hillman, G.R., Wuersig, B., Gailey, G.A., Kehtarnavaz, N, Drobyshevsky, A., Araabi, B.N.,
Tagare, H.D., and Weller, D.W. 2003. Computer-assisted photo-identification of individual
marine vertebrates: a multi-species system. Aquatic Mammals 29(1):117-123.
Koski, W.R., Davis, R.A, Miller, G.W., and Withrow, D.E. 1992. Growth rates of bowhead
whales as determined from low-level aerial photogrammetry. Rep. int. Whal. Commn 42:491-
499.
Koski, W.R., Davis, R.A., Miller, G.W. and Withrow, D.E. 1993. Reproduction. pp. 239-74. In:
J.J. Burns, J.J. Montague and C.J. Cowles (eds) The Bowhead Whale. Special Publication. No.
2. Society for Marine Mammalogy, Lawrence, Kansas. 787pp.
Koski, W.R., Rugh, D.J., Punt, A.E. and Zeh, J. 2006. An approach to minimize bias in
estimation of the length-frequency distribution of bowhead whales (Balaena mysticetus) from
aerial photogrammetric data. J. Cetacean Res. Manage. 8:45-54.
Koski, W.R., George, J.C. and Zeh, J. 2007. A calf index for monitoring reproductive success in
BCB bowhead whales (Balaena mysticetus). Paper SC/59/BRG7 presented to the Int. Whal.
Commn Scientific Committee, Anchorage, AK, 7-18 May 2007. 6pp.
Koski, W.R., Mocklin, J., Davis, A.R., Zeh, J., Rugh, D.J., George, J.C and Suydam, R. 2008.
Estimates of Bering-Chukchi-Beaufort bowhead whale (Balaena mysticetus) abundance from
2003-2005 photo-identification data. Paper SC/60/BRG18 presented to the Int. Whal. Commn
Scientific Committee, Santiago, Chile, 1-13 June 2008. 6pp.
6
SC/60/BRG24
Markowitz, T.M, Harlin, A.D, Würsig, B. 2003. Digital photography improves efficiency of
individual dolphin identification. Marine Mammal Science 19(1):217-223.
Miller, G.W., Davis, R.A, Koski, W.R., Crone, M.J., Rugh, D.J., Withrow, D.E. and Fraker, M.
1992. Calving intervals of bowhead whales -- An analysis of photographic data. Rep. Int. Whal.
Commn 42:501-506.
Rugh, D. 1990. Bowhead whales reidentified through aerial photography near Point Barrow,
Alaska. Rep. int. Whal. Commn (Spec. Issue 12):289-94.
Rugh, D.J., Miller, G.W., Withrow, D.E. and Koski, W.R. 1992a. Bowhead whale calving
intervals established through photographic reidentifications. J. Mammal. 73:487-490.
Rugh, D., Koski, W., and George, J. 2007. Interyear re-identification of bowhead whales during
their spring migration past Barrow, Alaska, 1984-2004. Paper SC/59/BRG2 presented to the
IWC Scientific Committee, Anchorage, AK, 7-18 May 2007. 9pp.
Schweder, T. 2003. Abundance estimation from multiple photo surveys: confidence distributions
and reduced likelihoods for bowhead whales off Alaska. Biometrics 59:974-83.
Schweder, T. and Sadykova, D. 2007. Event history models for capture-recapture surveys with
passively marked individuals. Paper SC/59/BRG28 presented to the Int. Whal. Commn
Scientific Committee, Anchorage, AK, May 2007. 14pp.
Schweder, T. and Sadykova, D. MS. Breaking the square root law. (unpublished). 9pp.
Whitcher, B., Zeh, J., Rugh, D., Koski, W. and Miller, G. 1996. Estimation of adult bowhead
whale survival rates. Paper SC/48/AS12 presented to the IWC Scientific Committee, September
1997 (unpublished). 22pp.
Zeh, J., Poole, D., Miller, G., Koski, W., Baraff, L. and Rugh, D. 2002. Survival of bowhead
whales, Balaena mysticetus, estimated from 1981-1998 photoidentification data. Biometrics
58:832-40.
7
SC/60/BRG24
Fig. 1. - OPENING: When opening an unprocessed image in the Compare program, two prompts
provide a warning that the image needs to be transformed and the spots to be marked.
8
SC/60/BRG24
Fig. 2 - GEOMETRIC TRANSFORM: To scale the whale brought up as Image 1, click ‘Input
Processing’ in the menu and ‘Geometric Transform.’ Two images appear, Image 1 (Fig. 2) and
the Reference Image (Fig. 3). To scale Image 1, click snout (left click) then tail (right click). On
the Reference Image, click same spot on snout (left click) and tail (right click) that was clicked
on image 1.
9
SC/60/BRG24
Fig. 3 - GEOMETRIC TRANSFORM: This is the reference image that image 1 is scaled to.
On this reference image, click same spot on snout (left click) and tail (right click) that was
clicked on image 1.
10
SC/60/BRG24
Fig. 4 - GEOMETRIC TRANSFORM: After last click on tail end of the Reference Image, the
two image windows disappear, and the geometric transform shows in the main window and is
prompted to be saved.
11
SC/60/BRG24
Fig. 5 - UNTWIST: To untwist whale in Image 1 choose ‘Input Processing’ then ‘Untwist’ and a
image 1will appear in a window. To untwist the whale, six or more asterisks must be clicked
evenly spaced down the midline of the whale. The program takes about 15 seconds to untwist a
whale image. Due to the time it takes to untwist the whale, the Untwist feature must only be
utilized if spots on whale would be affected or not be able to be compared to a ‘straight’ whale.
12
SC/60/BRG24
Fig. 6 - SELECT SPOTS: At this stage, the whale image is untwisted. To indicate permanent
marks and scars, there is a ‘Select Spots’ command in the ‘Input Processing’ menu option.
13
SC/60/BRG24
Fig. 7 - SELECT SPOTS: When selected, image 1 will pop up, and all scars need to be indicated
by setting the asterisk with the mouse. When in doubt whether or not a mark is real, it should be
indicated as a mark. After all the marks are indicated, the ones that are larger than an asterisk
should be further processed (Fig. 8).
14
SC/60/BRG24
Fig. 8 - SELECT SPOTS: If a mark is larger than the asterisk or has a unique shape other than a
circle, the mark can be zoomed in and the perimeter of the mark outlined. When you right click
near the image, select ‘Mark Attributes’ to outline the perimeter of the mark (Fig. 9).
15
SC/60/BRG24
Fig. 9 - SELECT SPOTS: This is the mark from Fig. 8 being outlined to indicate its uniqueness.
16
SC/60/BRG24
Fig. 10 - IMAGE 1 WITH POINTS: When the Image 1 window has been closed, the main screen
will be visible. At this stage, the image may be reviewed to add, move, remove, or outline more
marks. Go to ‘Viewers’ in the Menu and select ‘Image 1 w/ points.’ Image 1 window will
appear in a larger window (Fig. 11).
17
SC/60/BRG24
Fig. 11 - IMAGE 1WITH POINTS: This is Image 1, ready to be reviewed by zooming in and
making changes. By right-clicking on Image 1, a menu will appear with the options to ‘add
point,’ ‘move point,’ ‘remove point,’ ‘mark attributes’ (outlining unique or large marks) and
finally ‘show attributes’ (allow you to see characteristics of a mark, such as, perimeter, area,
eccentricity, length, and orientation).
18
SC/60/BRG24
Fig. 12 - REGION OF INTEREST: If Image 1 is an incomplete picture of a whale where the
whole body is not visible due to splash, ice, or placement, the ‘Region of Interest’ (ROI) may be
indicated. This is only applied if you cannot tell if there are marks on the part of the whale that
is obscured. This process is done after scaling, untwisting, and marks have been indicated (Fig.
13).
19
SC/60/BRG24
Fig. 13 - REGION OF INTEREST: This is Image 1 after it has been scaled (‘Geometric
Transform’) and all marks have been indicated and outlined. It is not necessary to ‘Untwist’ this
image since we are not using the part of the whale that is not clearly visible, therefore we will
only mark the ROI which will be used in the matching part of the program.
20
SC/60/BRG24
Fig. 14 - REGION OF INTEREST: To mark the ROI, go to ‘Input Processing’ and then select
‘Mark ROI.’ Image 1 will appear in a larger window (Fig 15).
21
SC/60/BRG24
Fig. 15 - REGION OF INTEREST: Similar to how a unique mark is outlined, an outline will be
drawn around the region of the whale that can clearly be seen and has already been marked.
22
SC/60/BRG24
Fig. 16 - REGION OF INTEREST: Once the ROI has been outlined, double clicking inside the
area outlined will then appear so that the area marked can be reviewed. To go back to the main
window, this Image 1 window can just be closed, and the ROI is automatically saved.
23
SC/60/BRG24
Fig. 17 - REGION OF INTEREST: This is the final view of Image 1 with the marks and the
indicated ROI around the tail and fluke.
24
SC/60/BRG24
Fig. 18 - ADD TO EXCEL FILE: When the image is done being processed, go to ‘File’ and
select “Add to Excel file.” This copies the image name and pathway to an Excel spreadsheet
used in running the matching part of the program.
25
SC/60/BRG24
Fig. 19 – COMPARE TO LIST: To run the program to produce a list of possible matches click
the ‘Compare to List’ button on the lower right of the main screen (Fig. 1). The list is presented
showing six images at a time in order of most likely matches to least likely matches.
26
SC/60/BRG24
27
Fig. 20 - COMPARE TO LIST: From the window with the list of six images (Fig. 19), an image
can be enlarged and compared to Image 1. When you click ‘Enlarge’, you can see the points
from Image 1 overlapped on the enlarged image from the list. From this point, a trained
technician will sort through all the possible matches to confirm if there are true matches.